May 9, 2016

期末報告

組別/報告日期 – 大四

  • 5/30: 雙北市新興旅店,你就是錢老闆*3 –種類,旅客來源,影響因子(交通?平台?)

  • 6/6:
    • 臺北市道路速率資料分析 *4 –地理位置/時間/天氣
    • 台北市的黃金單車路線 *2 –位置/設施/空汙
    • 台灣交通事故統計資料分析 *4 –時間/車種/嚴重程度/原因/地理位置
    • 小老婆汽機車資訊網 機車資訊調查 *3 –FB評論/事故(輕重、種類...

組別/報告日期 – 照作業繳交時間安排

  • 6/13:
    • Marvel 101 *1 – 角色能力/出場次數/漫畫章節數(關連性?影響因子
    • 地震與雨量真的有關係嗎? *2 – 雨量/地震(地震震源位置/深淺
    • 分析教育程度與各行各業的關係 *3 – 學歷/就業(主修種類、年紀、性別
    • Facebook粉專分析 *3 – 按讚數/留言數與留言本身的關係(留言分類?

組別/報告日期 – 運動組

  • 6/20:
    • 台灣人變少了?(台灣人口危機) *5 – 人口年齡結構/男女比例/遷入遷出(影響因素:地理位置-就業機會?生育補助?居住成本?
    • NBA總覽分析 *4 – 上場次數 (薪資?上場時間?得分?實際貢獻?
    • Kobe的籃球生涯 *2 – 年紀/數據(總教練?隊友?
    • MLB職棒分析 *3 – 球隊隊員薪水/自責分率/上壘率/打擊率(計算合理薪水?
    • CPBL *4 – 打擊率/上壘率/失誤率/勝率 (勝利因素,投手?
    • 棒球參考
    • 籃球參考

新的資料來源

期末報告規定

  • 報告15-20分鐘,問問題5分鐘 —> 講重點,嚴格計時

  • 整組交一份書面報告
    • 組員與工作分配(會影響成績)
    • 資料分析報告
    • 資料討論/遇到的困難
    • 回答報告時被問的問題

作業問題

  • 若資料的更新頻率是"每五分鐘","每天"
    • 請寫程式或人工把需要的時間區間資料取回
  • 資料格式:表格?圖表?—若已經是圖表,就不用做報告了…..

  • 若資料是JSON, XML,請務必要儘速測試資料是否能順利讀取

  • 作業的複雜程度與組員人數盡量成正比,會斟酌加減分

ggplot2 還能畫更多圖

Choropleth map 面量圖

choroplethr & choroplethrMaps packages

Choropleth map

  • 面量圖
  • 把統計資料用顏色畫在對應的地圖上
  • choroplethr & choroplethrMaps packages
  • 基於ggplot2 package所做的專門畫面量圖的工具
if (!require('choroplethr')){
    install.packages("choroplethr")
    library(choroplethr) ## for state_choropleth()
}
if (!require('choroplethrMaps')){ 
    install.packages("choroplethrMaps") ##上次沒有安裝到這個package
    library(choroplethrMaps) ## for state_choropleth()
}

美國各州人口分布

用到choroplethr & choroplethrMaps packages, 記得先讀入

data(df_pop_state) #記載各州人口數的資料
state_choropleth(df_pop_state) #把各州人口畫在地圖上

Taiwan的面量圖

  • 使用rgdal, rgeos,maptools package處理地圖檔
  • 使用ggplot2 & RColorBrewer 畫圖

Taiwan的面量圖

  • 用台灣的Open Data來畫一次
  • 還沒有好用的package可用
  • 只好自己從頭來了
  • 下載台灣的地圖資料 政府資料開放平台
  • 鄉鎮市區界線(TWD97經緯度)
  • 將下載的資料夾解壓縮,放到專案資料夾內
  • in shapefile Wiki
  • 空間資料開放格式
  • 參考資料

將shapefile讀入R

使用maptools package 的readShapeSpatial function

if (!require('rgdal')){#for fortify()
    install.packages("rgdal");library(rgdal)
}
if (!require('rgeos')){#for fortify()
    install.packages("rgeos");library(rgeos) 
}
if (!require('maptools')){#for readShapeSpatial()
    install.packages("maptools");library(maptools) 
}
tw_new <- readShapeSpatial("Taiwan/Town_MOI_1041215.shp") #檔名
names(tw_new)
##  [1] "OBJECTID"   "T_UID"      "Town_ID"    "T_Name"     "T_Desc"    
##  [6] "Add_Date"   "Add_Accept" "Remark"     "County_ID"  "C_Name"

處理shapefile-1

  • 需要rgdal, rgeos,maptools
  • fortify: 將shapefile物件轉為data.frame
  • 參考資料
library(ggplot2) #for fortify(), ggplot(), ggmap()
head(tw_new$Town_ID)
## [1] 1001402 1001321 1000913 1001411 1001416 1000712
## 368 Levels: 0900701 0900702 0900703 0900704 0902001 0902002 ... 6801300
tw_new.df <- fortify(tw_new, region = "T_UID") #from ggplot2 package

處理shapefile-2

head(tw_new.df,10)
##        long      lat order  hole piece id group
## 1  119.9170 26.17518     1 FALSE     1  1   1.1
## 2  119.9171 26.17517     2 FALSE     1  1   1.1
## 3  119.9171 26.17518     3 FALSE     1  1   1.1
## 4  119.9171 26.17518     4 FALSE     1  1   1.1
## 5  119.9171 26.17518     5 FALSE     1  1   1.1
## 6  119.9172 26.17518     6 FALSE     1  1   1.1
## 7  119.9172 26.17518     7 FALSE     1  1   1.1
## 8  119.9172 26.17518     8 FALSE     1  1   1.1
## 9  119.9173 26.17515     9 FALSE     1  1   1.1
## 10 119.9173 26.17515    10 FALSE     1  1   1.1

做一個假資料來畫:著色基準檔

#做一個假資料來畫
#prevalence設為亂數rnorm(需要的亂數個數)
mydata<-data.frame(NAME_2=tw_new$T_Name, id=tw_new$T_UID,
                   prevalence=rnorm(length(tw_new$T_UID)))
head(mydata)
##                   NAME_2  id prevalence
## 1 \xa6\xa8\xa5\\\xc2\xed 178 -1.0704142
## 2            \xa8ΥV\xb6m 164 -1.6089234
## 3     \xb3\xc1\xbcd\xb6m 118  2.0302805
## 4     \xba\xf1\xaeq\xb6m 376  1.6737724
## 5  \xc4\xf5\xc0\xac\xb6m 369  0.5319908
## 6      \xa5Ф\xa4\xc2\xed  78  0.7070405

中文編碼

#利用iconv將不知所以然的代碼(\xa6\xa8\xa5\\\xc2\xed)轉為看得懂的中文
#from big5 to utf-8
mydata$NAME_2<-iconv(as.character(mydata$NAME_2), #NAME_2原本是factor
                     from="big5", to = "UTF-8")
head(mydata,10)
##    NAME_2  id  prevalence
## 1  成功鎮 178 -1.07041422
## 2  佳冬鄉 164 -1.60892338
## 3  麥寮鄉 118  2.03028055
## 4  綠島鄉 376  1.67377244
## 5  蘭嶼鄉 369  0.53199077
## 6  田中鎮  78  0.70704051
## 7  社頭鄉  83  1.18218352
## 8  竹田鄉 157  0.35104423
## 9  萬丹鄉 148 -0.06741472
## 10 三灣鄉  64 -0.27611486

合併的圖檔與著色基準檔

#最後將有prevalence的假數據mydata和經緯度資料tw_new.df合併, 用merge()
final.plot<-merge(tw_new.df,mydata,by="id",all.x=T)
head(final.plot,10)
##    id     long      lat order  hole piece group NAME_2 prevalence
## 1   1 119.9170 26.17518     1 FALSE     1   1.1 南竿鄉  -2.277062
## 2   1 119.9171 26.17517     2 FALSE     1   1.1 南竿鄉  -2.277062
## 3   1 119.9171 26.17518     3 FALSE     1   1.1 南竿鄉  -2.277062
## 4   1 119.9171 26.17518     4 FALSE     1   1.1 南竿鄉  -2.277062
## 5   1 119.9171 26.17518     5 FALSE     1   1.1 南竿鄉  -2.277062
## 6   1 119.9172 26.17518     6 FALSE     1   1.1 南竿鄉  -2.277062
## 7   1 119.9172 26.17518     7 FALSE     1   1.1 南竿鄉  -2.277062
## 8   1 119.9172 26.17518     8 FALSE     1   1.1 南竿鄉  -2.277062
## 9   1 119.9173 26.17515     9 FALSE     1   1.1 南竿鄉  -2.277062
## 10  1 119.9173 26.17515    10 FALSE     1   1.1 南竿鄉  -2.277062

畫台灣面量圖-1

library(RColorBrewer) #配色用brewer.pal( 9 , "Reds" )
twcmap<-ggplot() +
    geom_polygon(data = final.plot, 
                 aes(x = long, y = lat, group = group, 
                     fill = prevalence), 
                 color = "black", size = 0.25) + 
    coord_map()+#維持地圖比例
    scale_fill_gradientn(colours = brewer.pal(9,"Reds"))+
    theme_void()+
    labs(title="Prevalence of X in Taiwan")

畫台灣面量圖-2

twcmap

使用google map

ggmap package

ggmap,把google map載入

if (!require('ggmap')){
    install.packages("ggmap")
    library(ggmap)#for get_map()
}
twmap <- get_map(location = 'Taiwan', zoom = 7,language = "zh-TW")
#location:可以是地名,也可以是經緯度座標
#zoom:放大比例 2-20
#language:地圖語言

ggmap,畫圖

ggmap(twmap) #基於ggplot2物件,可用相同方式處理

ggmap,maptype

#maptype:地圖型態,"terrain", "terrain-background", "satellite",
# "roadmap","hybrid" (google maps), "terrain", "watercolor", 
# "toner" (stamen maps), 
# or a positive integer for cloudmade maps (see ?get_cloudmademap)
TaipeiMap = get_map(location = c(121.43,24.93,121.62,25.19), 
                    zoom = 14, maptype = 'roadmap')
ggmap(TaipeiMap,extent = 'device') #extent = 'device' 滿版

ggmap+面量圖

ggmap(twmap)+ #ggmap
    geom_polygon(data = final.plot,  #面量圖
        aes(x = long, y = lat, group = group, fill = prevalence), 
        color = "grey80", size = 0.1,alpha = 0.5) + 
scale_fill_gradientn(colours = brewer.pal(9,"Reds"))

Density Map? 美國人口密度圖

使用 ggplot2 + ggmap 來畫人口密度圖 - 資料前處理1

#取得美國各州中心座標資料
StateCenter<-data.frame( 
    region=tolower(state.name),lon=state.center$x,lat=state.center$y)
head(StateCenter,1)
##    region      lon     lat
## 1 alabama -86.7509 32.5901
#美國各州人口資料
StatePop<-merge(df_pop_state,StateCenter,by="region") 
head(StatePop,1)
##    region   value      lon     lat
## 1 alabama 4777326 -86.7509 32.5901

使用 ggplot2 + ggmap 來畫人口密度圖 - 資料前處理2

#將人口數值,轉為點!重要!
PopPoint<-NULL 
for(i in 1:nrow(StatePop)){
    #每100萬人轉為1點
    for(j in 1:(StatePop[i,"value"]/1000000)){
        PopPoint<-rbind(PopPoint,StatePop[i,])   
    }
}
head(PopPoint,3)
##    region   value      lon     lat
## 1 alabama 4777326 -86.7509 32.5901
## 2 alabama 4777326 -86.7509 32.5901
## 3 alabama 4777326 -86.7509 32.5901

使用 ggplot2 + ggmap 來畫人口密度圖 - 作圖

USMap <- get_map(location = "United States", zoom = 4)
densityMap<-ggmap(USMap, extent = "device") + 
    geom_density2d(data = PopPoint, aes(x = lon, y = lat), size = 0.3) + 
    stat_density2d(data = PopPoint, 
            aes(x = lon, y = lat, fill = ..level.., alpha = ..level..), 
                size = 0.01, bins = 16, geom = "polygon") + 
    scale_fill_gradient(low = "green", high = "red", guide = FALSE) + 
    scale_alpha(range = c(0, 0.3), guide = FALSE)

美國人口密度圖 Density Map

densityMap

ggmap參考資料

Heatmap 先將資料讀入

參考資料

#讀.csv檔案
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
head(nba)
##             Name  G  MIN  PTS  FGM  FGA   FGP FTM FTA   FTP X3PM X3PA
## 1   Dwyane Wade  79 38.6 30.2 10.8 22.0 0.491 7.5 9.8 0.765  1.1  3.5
## 2  LeBron James  81 37.7 28.4  9.7 19.9 0.489 7.3 9.4 0.780  1.6  4.7
## 3   Kobe Bryant  82 36.2 26.8  9.8 20.9 0.467 5.9 6.9 0.856  1.4  4.1
## 4 Dirk Nowitzki  81 37.7 25.9  9.6 20.0 0.479 6.0 6.7 0.890  0.8  2.1
## 5 Danny Granger  67 36.2 25.8  8.5 19.1 0.447 6.0 6.9 0.878  2.7  6.7
## 6  Kevin Durant  74 39.0 25.3  8.9 18.8 0.476 6.1 7.1 0.863  1.3  3.1
##    X3PP ORB DRB TRB AST STL BLK  TO  PF
## 1 0.317 1.1 3.9 5.0 7.5 2.2 1.3 3.4 2.3
## 2 0.344 1.3 6.3 7.6 7.2 1.7 1.1 3.0 1.7
## 3 0.351 1.1 4.1 5.2 4.9 1.5 0.5 2.6 2.3
## 4 0.359 1.1 7.3 8.4 2.4 0.8 0.8 1.9 2.2
## 5 0.404 0.7 4.4 5.1 2.7 1.0 1.4 2.5 3.1
## 6 0.422 1.0 5.5 6.5 2.8 1.3 0.7 3.0 1.8

Heatmap 資料處理,寬表轉長表

library(reshape2) #for melt()
nba.m <- melt(nba,id.vars = "Name") #寬表轉長表,以名字作依據
head(nba.m,10)
##                Name variable value
## 1      Dwyane Wade         G    79
## 2     LeBron James         G    81
## 3      Kobe Bryant         G    82
## 4    Dirk Nowitzki         G    81
## 5    Danny Granger         G    67
## 6     Kevin Durant         G    74
## 7     Kevin Martin         G    51
## 8     Al Jefferson         G    50
## 9       Chris Paul         G    78
## 10 Carmelo Anthony         G    66

Heatmap geom_tile()

library(ggplot2) #for ggplot()
ggplot(nba.m, aes(variable, Name)) + #aes(x,y)
    geom_tile(aes(fill = value),colour = "white")+ #geom_tile: 區塊著色
    scale_fill_gradient(low = "white",high = "steelblue") #數值低:白色

Heatmap 資料前處理:scale

head(nba,2)
##            Name  G  MIN  PTS  FGM  FGA   FGP FTM FTA   FTP X3PM X3PA  X3PP
## 1  Dwyane Wade  79 38.6 30.2 10.8 22.0 0.491 7.5 9.8 0.765  1.1  3.5 0.317
## 2 LeBron James  81 37.7 28.4  9.7 19.9 0.489 7.3 9.4 0.780  1.6  4.7 0.344
##   ORB DRB TRB AST STL BLK  TO  PF
## 1 1.1 3.9 5.0 7.5 2.2 1.3 3.4 2.3
## 2 1.3 6.3 7.6 7.2 1.7 1.1 3.0 1.7
nba[,2:21]<-apply(nba[,2:21], 2, scale) #scale處理,將數值轉為平均=0
head(nba,2)
##            Name         G       MIN      PTS      FGM      FGA       FGP
## 1  Dwyane Wade  0.6179300 1.0019702 3.179941 2.920022 2.596832 0.5136017
## 2 LeBron James  0.7693834 0.6119299 2.566974 1.957185 1.697237 0.4649190
##        FTM      FTA        FTP       X3PM      X3PA        X3PP
## 1 1.917475 2.110772 -0.7401673 -0.1080044 0.1303647 -0.15749098
## 2 1.778729 1.896589 -0.5233214  0.4920201 0.6971679  0.02738974
##           ORB        DRB        TRB      AST      STL       BLK       TO
## 1 -0.27213551 -0.3465676 -0.3287465 1.652247 2.558238 1.2064646 1.790445
## 2 -0.06117775  1.0080940  0.6605370 1.516147 1.367252 0.8627425 1.059651
##           PF
## 1 -0.2984568
## 2 -1.3903719

apply() 還記得嗎….

有類似for迴圈的功能

  • apply(Data, MARGIN, FUN,…)
    • Data:矩陣(Matrix),Data Frame
    • MARGIN:1=row, 2=column
    • FUN:函數
    • …:函數要用的參數
 #針對nba的各column做scale處理-->將數值轉為平均=0
apply(nba[,2:21], 2, scale)
##                 G          MIN         PTS         FGM         FGA
##  [1,]  0.61793000  1.001970241  3.17994147  2.92002219  2.59683207
##  [2,]  0.76938343  0.611929922  2.56697353  1.95718513  1.69723666
##  [3,]  0.84511015 -0.038137276  2.02211313  2.04471578  2.12561543
##  [4,]  0.76938343  0.611929922  1.71562916  1.86965449  1.74007454
##  [5,] -0.29079059 -0.038137276  1.68157539  0.90681744  1.35453365
##  [6,]  0.23929642  1.175321493  1.51130651  1.25694001  1.22602002
##  [7,] -1.50241804  0.828618988  1.27293009 -0.66873410 -0.01627839
##  [8,] -1.57814476  0.135213977  0.76212347  1.95718513  1.52588516
##  [9,]  0.54220328  0.958632428  0.65996215  0.55669488  0.06939736
## [10,] -0.36651730 -0.774880100  0.65996215  0.55669488  1.01183064
## [11,]  0.46647657  0.785281175  0.62590838  0.46916424  0.19791099
## [12,]  0.54220328  0.395240856  0.59185460  0.55669488  0.41210037
## [13,]  0.76938343  0.828618988  0.45563950  0.73175616  0.79764126
## [14,]  0.08784299 -0.948231352  0.38753195  1.25694001  0.66912763
## [15,] -1.35096461  0.221889603  0.18320931  0.11904167 -0.78736017
## [16,]  0.61793000  1.392010559  0.18320931  0.29410295  0.88331701
## [17,] -0.13933716 -0.081475089  0.14915553 -0.75626474 -0.35898140
## [18,] -2.86549892  0.048538351  0.11510176  0.03151103  0.28358674
## [19,]  0.39074985  1.305334933  0.04699421  0.46916424  0.45493825
## [20,] -1.57814476 -0.514853221 -0.02111334  0.73175616  0.66912763
## [21,] -0.29079059  1.001970241 -0.02111334 -0.14355025  0.11223524
## [22,]  0.69365672  0.221889603 -0.02111334 -0.05601961  0.36926249
## [23,] -0.89660431  1.478686186 -0.05516711 -0.40614217  0.41210037
## [24,]  0.84511015  0.135213977 -0.05516711 -0.14355025  0.02655948
## [25,]  0.61793000 -0.254826342 -0.08922089 -0.31861153 -1.51560407
## [26,]  0.76938343  0.481916483 -0.12327466 -0.66873410 -0.57317079
## [27,]  0.16356971 -0.601528847 -0.25948976 -0.14355025  0.28358674
## [28,] -0.44224402  0.785281175 -0.39570486 -0.93132602 -0.10195415
## [29,]  0.46647657 -1.164920418 -0.39570486 -0.05601961 -1.08722530
## [30,]  0.84511015 -0.168150715 -0.42975863 -0.84379538 -0.44465716
## [31,]  0.23929642 -1.164920418 -0.42975863 -0.14355025 -0.05911627
## [32,] -0.21506387  0.265227417 -0.49786618 -0.58120346 -0.61600866
## [33,]  0.31502314 -1.121582605 -0.53191996 -0.05601961 -0.48749503
## [34,] -3.47131265 -0.298164155 -0.63408128  0.29410295  0.54061400
## [35,]  0.61793000  0.438578669 -0.66813506 -0.23108089  0.02655948
## [36,]  0.76938343  0.351903043 -0.66813506 -0.14355025 -1.30141468
## [37,]  0.84511015  1.522023999 -0.70218883 -0.75626474 -0.83019804
## [38,] -1.50241804 -2.248365748 -0.77029638 -1.54404051 -1.51560407
## [39,]  0.84511015  0.741943362 -0.80435016 -0.49367282 -0.14479202
## [40,]  0.61793000  0.525254296 -0.87245770 -0.84379538 -0.91587380
## [41,] -0.29079059 -0.991569166 -0.87245770 -0.40614217 -0.14479202
## [42,]  0.61793000  0.005200538 -0.90651148 -1.01885666 -1.17290106
## [43,]  0.76938343  0.351903043 -0.94056525 -0.05601961 -0.27330565
## [44,] -1.42669132 -1.901663242 -0.97461903 -0.58120346 -0.35898140
## [45,]  0.76938343 -0.558191034 -1.04272658 -0.84379538 -0.87303592
## [46,]  0.31502314 -2.681743880 -1.04272658 -0.58120346 -2.02965858
## [47,]  0.61793000 -0.038137276 -1.07678035 -1.19391794 -0.91587380
## [48,]  0.61793000 -0.428177594 -1.07678035 -1.98169371 -1.51560407
## [49,] -1.04805775  0.178551790 -1.14488790 -1.19391794 -0.57317079
## [50,]  0.23929642 -2.768419506 -1.24704922 -1.19391794 -0.87303592
##               FGP         FTM         FTA         FTP       X3PM
##  [1,]  0.51360167  1.91747526  2.11077208 -0.74016734 -0.1080044
##  [2,]  0.46491905  1.77872886  1.89658922 -0.52332144  0.4920201
##  [3,] -0.07058980  0.80750405  0.55794635  0.57536445  0.2520103
##  [4,]  0.22150593  0.87687725  0.45085492  1.06688183 -0.4680191
##  [5,] -0.55741603  0.87687725  0.55794635  0.89340511  1.8120740
##  [6,]  0.14848200  0.94625045  0.66503778  0.67655921  0.1320054
##  [7,] -1.21463144  2.95807327  2.37850066  0.73438478  1.3320544
##  [8,]  0.65964954 -0.71870636 -0.45942223 -1.13048996 -1.4280583
##  [9,]  0.80569741  0.73813085  0.45085492  0.74884117 -0.4680191
## [10,] -0.65478128  0.59938445  0.66503778 -0.33538832 -0.2280093
## [11,]  0.41623642  1.22374326  1.14694921  0.01156511 -1.1880485
## [12,]  0.24584725  0.39126485  0.34376349  0.11275987 -0.1080044
## [13,] -0.04624849 -0.37184035 -0.13814794 -0.89918766  0.2520103
## [14,]  0.87872134 -0.57995996 -0.45942223 -0.49440865 -1.0680436
## [15,]  1.68198462  0.94625045  0.77212921  0.27178019 -1.3080534
## [16,] -0.80082915 -0.64933316 -0.67360509  0.14167265  0.8520348
## [17,] -0.77648783  1.70935566  1.57531493  0.05493429 -0.3480142
## [18,] -0.36268554 -0.51058676 -0.51296795 -0.03180407  1.0920446
## [19,]  0.05111675  0.04439885 -0.19169366  0.98014347 -1.3080534
## [20,]  0.12414069 -0.78807956 -0.51296795 -1.18831553 -0.7080289
## [21,] -0.41136816  0.25251845  0.07603492  0.60427724 -0.2280093
## [22,] -0.80082915 -0.37184035 -0.40587652  0.01156511  0.8520348
## [23,] -1.36067931  0.18314525  0.07603492  0.14167265  0.6120250
## [24,] -0.36268554 -0.51058676 -0.62005938  0.69101560  1.0920446
## [25,]  2.48524789  1.15437006  2.59268352 -3.21221059 -1.4280583
## [26,] -0.31400292  0.66875765  0.50440063  0.19949823  0.3720152
## [27,] -0.75214652 -1.06557236 -0.99487938 -0.33538832  1.3320544
## [28,] -1.45804455 -0.09434755 -0.29878509  0.80666675  1.2120495
## [29,]  1.90105642  0.11377205 -0.08460223  0.71992839 -1.4280583
## [30,] -0.75214652  0.25251845  0.23667206 -0.16191160  0.2520103
## [31,] -0.16795505 -1.41243836 -1.53033653  0.92231789  1.3320544
## [32,]  0.02677544  0.04439885 -0.13814794  0.47416970 -0.2280093
## [33,]  0.83003872 -0.16372075  0.29021777 -1.79548405 -1.4280583
## [34,] -0.46005079 -1.13494556 -1.10197081  0.19949823 -1.0680436
## [35,] -0.41136816 -0.99619916 -0.78069652 -0.71125455 -0.1080044
## [36,]  2.36354134 -0.37184035 -0.24523937 -0.50886504 -1.4280583
## [37,]  0.07545807 -0.09434755  0.29021777 -1.33287946 -0.2280093
## [38,] -0.21663767  1.36248966  1.20049493  0.11275987 -0.8280338
## [39,] -0.77648783 -1.20431876 -1.31615367  0.90786150  0.7320299
## [40,]  0.05111675 -0.78807956 -0.78069652  0.19949823  0.4920201
## [41,] -0.55741603 -0.99619916 -1.04842510  0.45971331 -0.2280093
## [42,]  0.24584725 -1.20431876 -1.42324510  1.96317821  1.5720642
## [43,]  0.34321249 -1.06557236 -0.94133367 -0.50886504 -1.3080534
## [44,] -0.46005079 -0.99619916 -0.88778795 -0.49440865 -0.1080044
## [45,] -0.07058980 -1.48181156 -1.63742796  1.38492248  1.3320544
## [46,]  3.38587642 -0.44121355  0.55794635 -3.19775420 -1.4280583
## [47,] -0.75214652 -1.34306516 -1.31615367  0.28623659  1.9320789
## [48,] -1.26331406  0.39126485 -0.03105651  1.39937887  1.0920446
## [49,] -1.28765537  0.04439885  0.12958063 -0.50886504 -0.8280338
## [50,] -0.80082915 -0.92682596 -0.99487938  0.35851855  0.6120250
##              X3PA        X3PP         ORB         DRB         TRB
##  [1,]  0.13036473 -0.15749098 -0.27213551 -0.34656760 -0.32874654
##  [2,]  0.69716790  0.02738974 -0.06117775  1.00809403  0.66053702
##  [3,]  0.41376631  0.07532177 -0.27213551 -0.23367913 -0.25264780
##  [4,] -0.53090563  0.13010124 -0.27213551  1.57253638  0.96493197
##  [5,]  1.64183984  0.43823576 -0.69405103 -0.06434643 -0.29069717
##  [6,] -0.05856966  0.56148957 -0.37761439  0.55654016  0.24199398
##  [7,]  1.02780308  0.51355754 -0.79952991 -0.85456572 -0.86143769
##  [8,] -1.47557758 -2.32812750  2.15387873  1.68542485  1.95421553
##  [9,] -0.43643844  0.16433841 -0.48309327  0.10498628 -0.13849970
## [10,] -0.29473765  0.21227045  0.25525889  0.38720745  0.35614208
## [11,] -1.23940959 -0.65050621  1.52100545  1.51609215  1.57372185
## [12,] -0.20027045  0.25335505 -0.06117775 -0.62878878 -0.44289464
## [13,]  0.31929912  0.07532177  1.09908993  1.12098250  1.15517880
## [14,] -1.09770880 -0.32867682 -1.01048767 -1.02389842 -1.05168452
## [15,] -1.47557758  0.60942161  0.88813217  0.78231709  0.85078386
## [16,]  0.93333588  0.13694868 -0.58857215 -0.51590031 -0.55704274
## [17,] -0.01133606 -0.33552426 -1.01048767 -0.91100995 -0.97558579
## [18,]  1.21673747  0.17803328 -0.69405103 -1.13678689 -1.01363516
## [19,] -1.38111039 -0.68474338  0.78265329  1.06453827  1.00298133
## [20,] -0.62537283 -0.06847434  1.83744209  1.34675944  1.61177122
## [21,] -0.05856966 -0.20542301  0.46621665 -0.06434643  0.12784588
## [22,]  0.79163509  0.30813452 -0.48309327 -0.17723490 -0.29069717
## [23,]  0.93333588 -0.01369487 -0.16665663 -0.34656760 -0.29069717
## [24,]  0.88610229  0.47932037 -0.79952991 -0.96745418 -0.89948705
## [25,] -1.52281118 -2.32812750  3.10318865  2.87075378  3.01959782
## [26,]  0.27206552  0.34921912 -0.69405103  0.27431898 -0.10045033
## [27,]  1.50013905  0.16433841  0.04430113  0.21787475  0.12784588
## [28,]  1.35843826  0.13694868 -1.01048767 -1.08034265 -1.08973389
## [29,] -1.52281118  4.51930632  1.31004769  1.51609215  1.53567248
## [30,]  0.17759833  0.39030373 -0.69405103 -0.34656760 -0.48094401
## [31,]  1.40567186  0.17803328 -0.90500879 -1.47545230 -1.31803010
## [32,]  0.03589753 -0.20542301 -1.01048767 -1.13678689 -1.12778326
## [33,] -1.52281118 -2.32812750  1.41552657  1.96764603  1.84006742
## [34,] -1.05047520 -0.21911788 -0.79952991 -0.40301184 -0.59509211
## [35,] -0.05856966  0.07532177  0.04430113 -0.17723490 -0.13849970
## [36,] -1.52281118  1.09558941  1.94292097  1.06453827  1.42152438
## [37,] -0.01133606 -0.22596532 -0.27213551  0.04854204 -0.06240096
## [38,] -0.62537283 -0.59572674 -0.37761439  0.04854204 -0.13849970
## [39,]  0.64993430  0.30128709 -0.69405103 -0.79812148 -0.78533895
## [40,]  0.27206552  0.52725240 -0.69405103 -0.57234454 -0.63314148
## [41,] -0.20027045  0.19172815 -0.69405103 -1.19323112 -1.05168452
## [42,]  1.40567186  0.47247293 -0.58857215 -1.02389842 -0.89948705
## [43,] -1.38111039 -0.61626904  1.62648433  0.04854204  0.62248766
## [44,] -0.01133606  0.03423717 -0.27213551 -0.34656760 -0.29069717
## [45,]  0.93333588  0.65735365 -0.79952991 -0.91100995 -0.93753642
## [46,] -1.52281118 -2.32812750  1.20456881  0.78231709  0.96493197
## [47,]  1.78354063  0.39030373 -0.16665663  0.04854204 -0.06240096
## [48,]  0.83886869  0.46562550 -1.01048767 -1.08034265 -1.08973389
## [49,] -0.71984002 -0.39030373 -0.90500879 -1.13678689 -1.08973389
## [50,]  0.93333588 -0.10271151 -0.06117775 -1.08034265 -0.74728958
##               AST         STL          BLK         TO          PF
##  [1,]  1.65224666  2.55823818  1.206464582  1.7904454 -0.29845679
##  [2,]  1.51614727  1.36725206  0.862742479  1.0596514 -1.39037189
##  [3,]  0.47271857  0.89085761 -0.168423831  0.3288573 -0.29845679
##  [4,] -0.66144306 -0.77652295  0.347159324 -0.9500323 -0.48044264
##  [5,] -0.52534367 -0.30012850  1.378325634  0.1461588  1.15743000
##  [6,] -0.47997720  0.41446317  0.175298273  1.0596514 -1.20838604
##  [7,] -0.52534367  0.17626595 -0.684006985  0.8769529 -0.29845679
##  [8,] -1.02437478 -0.77652295  1.893908788 -1.1327308  0.61147245
##  [9,]  3.24007294  3.98742152 -0.855868037  1.0596514  0.42948660
## [10,] -0.20777841 -0.06193128 -0.340284882  1.0596514  0.97544415
## [11,] -0.61607660 -0.53832572  0.690881427 -0.2192382  0.06551491
## [12,]  0.56345150 -0.06193128 -0.512145934 -0.9500323 -1.57235774
## [13,] -0.88827539  0.17626595 -0.512145934 -1.6808263  0.42948660
## [14,]  1.38004787 -0.53832572 -0.855868037  0.3288573 -1.75434359
## [15,] -0.84290892 -0.53832572  0.862742479  0.6942543  1.15743000
## [16,]  0.88101675 -0.06193128 -0.684006985  0.1461588 -0.48044264
## [17,]  1.38004787  1.36725206 -0.684006985  1.2423499 -0.11647094
## [18,] -0.52534367 -0.06193128 -0.855868037 -1.4981278 -1.93632944
## [19,] -0.70680953 -1.25291739  0.519020376 -0.5846352  0.42948660
## [20,] -0.79754246 -0.53832572 -0.512145934 -0.2192382  0.42948660
## [21,]  0.20051978  1.12905484 -0.512145934  1.2423499  0.06551491
## [22,]  0.38198564 -0.30012850 -0.168423831 -0.5846352  0.79345830
## [23,]  1.19858201  0.89085761 -0.168423831  2.7039380  0.24750076
## [24,] -0.20777841 -0.53832572 -0.512145934 -0.0365397 -0.48044264
## [25,] -1.11510771 -0.30012850  3.956241407  1.0596514  1.70338755
## [26,] -0.11704548 -0.30012850 -0.512145934  0.6942543  0.42948660
## [27,] -1.11510771  0.17626595 -0.512145934 -0.4019367  1.15743000
## [28,]  0.24588624 -0.53832572 -0.684006985 -0.2192382 -1.93632944
## [29,] -0.93364185 -1.72931184  2.237630892  1.0596514  1.52140170
## [30,] -0.66144306 -0.77652295 -0.684006985 -0.7673338  1.15743000
## [31,] -0.20777841  0.41446317 -0.512145934 -1.4981278 -1.02640019
## [32,]  3.10397355 -0.06193128 -0.512145934  1.7904454 -0.84441434
## [33,] -0.16241195 -1.49111462  1.893908788 -0.4019367 -0.29845679
## [34,] -0.07167901  1.12905484 -0.512145934  0.5115558  0.42948660
## [35,] -0.97900832  0.17626595  0.175298273  0.3288573  0.61147245
## [36,] -0.16241195 -1.25291739  0.690881427 -0.9500323 -0.66242849
## [37,]  0.65418443  1.12905484 -0.340284882  0.5115558 -1.02640019
## [38,] -0.93364185 -0.53832572 -0.684006985 -0.0365397  2.43133095
## [39,] -0.29851134 -0.06193128 -0.684006985  0.6942543  0.06551491
## [40,] -0.29851134 -0.06193128 -0.512145934 -0.5846352 -0.29845679
## [41,]  0.24588624 -1.25291739 -0.855868037 -0.7673338  0.24750076
## [42,] -0.47997720 -0.53832572 -0.684006985 -1.3154293 -0.84441434
## [43,] -0.88827539 -0.30012850  0.690881427 -1.6808263  0.24750076
## [44,] -1.02437478 -0.06193128  0.003437221 -1.3154293  0.24750076
## [45,]  0.10978685 -0.53832572 -0.855868037 -0.4019367  0.42948660
## [46,] -0.97900832 -1.01472017  1.378325634 -0.4019367  1.70338755
## [47,] -0.57071013 -0.30012850  0.003437221 -0.7673338  0.06551491
## [48,]  1.15321554  0.17626595 -0.684006985 -0.4019367 -0.84441434
## [49,]  0.51808503  0.89085761 -0.855868037  0.3288573 -1.75434359
## [50,]  0.10978685  0.41446317 -0.855868037 -0.9500323  0.61147245

Heatmap 資料前處理:scale

nba.m <- melt(nba)
ggplot(nba.m, aes(variable, Name)) + #aes(x,y)
    geom_tile(aes(fill = value),colour = "white")+ #geom_tile: 區塊著色
    scale_fill_gradient(low = "white",high = "steelblue") #數值低:白色

Treemap

需要treemapify packages

Treemap -資料處理

if (!require('treemapify')){
    library(devtools) #install.packages("devtools") 沒安裝過記得先裝
    install_github("wilkox/treemapify") #從GitHub安裝測試版Packages,需要安裝devtools
    library(treemapify)
}
data(G20)#範例資料
head(G20)
##          Region        Country Trade.mil.USD Nom.GDP.mil.USD   HDI
## 1        Africa   South Africa        208000          384315 0.629
## 2 North America  United States       3969000        15684750 0.937
## 3 North America         Canada        962600         1819081 0.911
## 4 North America         Mexico        756800         1177116 0.775
## 5 South America         Brazil        494800         2395968 0.730
## 6 South America      Argentina        152690          474954 0.811
##   Population Economic.classification
## 1   53000000              Developing
## 2  316173000                Advanced
## 3   34088000                Advanced
## 4  112211789              Developing
## 5  201032714              Developing
## 6   40117096              Developing

Treemap -設定Treemap參數

#treemapify: 將資料轉為treemap所需格式
#area:面積來源(GDP)
#fill:著色來源(HDI)
#label:每個方塊分類依據(Country)
#group:方塊群組(Region)
treeMapCoordinates <- treemapify(data=G20, #data=資料來源(G20)
    area = "Nom.GDP.mil.USD",fill = "HDI",
    label = "Country",group = "Region")
head(treeMapCoordinates)
##    fill           label     xmin     xmax     ymin      ymax         group
## 1 0.876  European Union  0.00000 38.66972  0.00000  58.99641        Europe
## 2 0.920         Germany 38.66972 63.32079  0.00000  19.17284        Europe
## 3 0.893          France 38.66972 63.32079 19.17284  33.88097        Europe
## 4 0.875  United Kingdom 38.66972 63.32079 33.88097  47.64081        Europe
## 5 0.881           Italy 38.66972 63.32079 47.64081  58.99641        Europe
## 6 0.937   United States  0.00000 53.16491 58.99641 100.00000 North America

Treemap -作圖1

ggplotify(treeMapCoordinates)

Treemap -作圖2

好像數字越大顏色越深比較合理?

ggplotify(treeMapCoordinates)+ 
    scale_fill_gradient(low = "white",high = "steelblue") #指定高低顏色

Homework 6 -1

  • Data from Project Tycho®
  • University of Pittsburgh
  • Advance the availability and use of public health data for science and policy making
  • Data from all weekly notifiable disease reports for the US dating back to 1888
  • Download @ Course Website POLIO_Incidence.csv
  • 題目:請用圖表呈現1928-1969間,小兒麻痺在美國各州的發生率變化
  • 注意:1955年開始有小兒麻痺疫苗

Homework 6 -2

  • 題目:請用圖表呈現1928-1969間,小兒麻痺在美國各州的發生率變化
  • 預計會遇到的問題:
    • 下載的資料,沒有資料是用-表示,畫圖時必須注意(轉為NA)
    • 資料以每週/每州呈現,需依各州各年度加總,計算該年度的總發生率
    • 資料屬於寬版表格,必須改成長版表格處理
    • 50個州與>20年的資料,如何呈現?

Homework 6 -3

  • 題目:請用圖表呈現1928-1969間,小兒麻痺在美國各州的發生率變化
  • 要求:
    • ggplot2 package做圖(或其他衍伸的packages)
    • 呈現各州,各年,發生率的變化
  • 圖形格式不限
  • 5/16(一) 11:59pm 繳交

Homework 6 -4

  • Title:1928-1969間,小兒麻痺在美國各州的發生率變化 (5 pt)
  • 次標題1:資料前處理 (20 pt)
    • 把資料讀進來 (5 pt)
    • 將寬表格轉為長表格 (5 pt)
    • 處理缺值 (5 pt)
    • 計算年度發生率 (5 pt)
  • 次標題2:視覺畫呈現 (80 pt)
    • 解釋如何選擇圖形種類 (10 pt)
    • 程式碼 (20 pt)
    • 圖形呈現 (40 pt),按照是否可以輕易看懂圖形給分
    • 解釋圖形 (10 pt)

Homework 6 -提示1

#讀csv資料
polio<-read.csv("POLIO_Incidence.csv",stringsAsFactors = F)
head(polio)
##   YEAR WEEK ALABAMA ALASKA ARIZONA ARKANSAS CALIFORNIA COLORADO
## 1 1928    1       0      -       0        0       0.17     0.39
## 2 1928    2       0      -       0        0       0.15      0.2
## 3 1928    3    0.04      -       0        0       0.11        0
## 4 1928    4       0      -    0.24     0.11       0.07      0.2
## 5 1928    5       0      -    0.24        0       0.32        0
## 6 1928    6       0      -       0        0       0.22      0.1
##   CONNECTICUT DELAWARE DISTRICT.OF.COLUMBIA FLORIDA GEORGIA HAWAII IDAHO
## 1           0        0                    -       0    0.03      -     0
## 2           0        0                    -       0       0      -     0
## 3        0.06        0                    -       0       -      -     0
## 4        0.06        0                    0       0       0      -     0
## 5        0.13        0                    0       0       0      -     0
## 6           0        0                    0       0       0      -     -
##   ILLINOIS INDIANA IOWA KANSAS KENTUCKY LOUISIANA MAINE MARYLAND
## 1     0.03    0.03 0.08      0        0         0     0     0.06
## 2     0.01    0.03    -   0.22        0      0.05  0.13     0.06
## 3     0.03    0.03    -      0        0         0     0        0
## 4     0.05    0.12    0      0        0         0     0        0
## 5     0.04       0 0.04      0        0         0  0.38     0.12
## 6     0.03       0    0      0        0         0     0        0
##   MASSACHUSETTS MICHIGAN MINNESOTA MISSISSIPPI MISSOURI MONTANA NEBRASKA
## 1          0.14     0.04         0           0     0.03    0.18     0.07
## 2          0.14     0.04      0.04           0     0.06       0     0.07
## 3          0.07     0.02         0           0     0.03    0.18        0
## 4          0.02     0.02         0           0     0.06       0        0
## 5          0.02     0.04         0           0        0       0     0.15
## 6          0.05     0.06         0           0        0       0     0.07
##   NEVADA NEW.HAMPSHIRE NEW.JERSEY NEW.MEXICO NEW.YORK NORTH.CAROLINA
## 1      -             -       0.08          0     0.08              0
## 2      -             -       0.03          0     0.05           0.03
## 3      -             -          0          0     0.03              0
## 4      -             0       0.03          0     0.06              0
## 5      -             0       0.03       0.48     0.07              0
## 6      -             0          0          0     0.03              0
##   NORTH.DAKOTA OHIO OKLAHOMA OREGON PENNSYLVANIA RHODE.ISLAND
## 1            - 0.02        0   0.64            0            0
## 2         0.45    -     0.04   0.43         0.03            0
## 3            0 0.06        0   1.07         0.02            0
## 4         0.15    0     0.09   0.53         0.02            0
## 5            0 0.03        0   0.32            0            0
## 6            0 0.05     0.04   0.21         0.04            0
##   SOUTH.CAROLINA SOUTH.DAKOTA TENNESSEE TEXAS UTAH VERMONT VIRGINIA
## 1           0.06            0      0.04  0.05    0       0        -
## 2           0.06            0      0.04  0.04    0       0        -
## 3           0.35            0         0     0    0       0        -
## 4           0.23            0      0.04  0.05    0       0        -
## 5           0.17         0.15         0  0.05    0       0        -
## 6           0.06         0.29      0.04     0  0.2       0     0.04
##   WASHINGTON WEST.VIRGINIA WISCONSIN WYOMING
## 1       0.26          0.06      0.03       0
## 2       0.39          0.24      0.03       0
## 3       0.13          0.12      0.03       0
## 4       0.06          0.12         0       0
## 5       0.13          0.06      0.03       0
## 6       0.06             0      0.14       0

Homework 6 -提示2

#將寬表格轉為長表格,以年(YEAR)/週(WEEK)為基準,各州名column打散變長
polio.m<-melt(polio,id.vars = c('YEAR','WEEK'))
head(polio.m)
##   YEAR WEEK variable value
## 1 1928    1  ALABAMA     0
## 2 1928    2  ALABAMA     0
## 3 1928    3  ALABAMA  0.04
## 4 1928    4  ALABAMA     0
## 5 1928    5  ALABAMA     0
## 6 1928    6  ALABAMA     0

Homework 6 -提示3

polio.m[polio.m$value=="-",]$value<-NA #處理缺值,將"-"轉為NA
polio.m$value<-as.numeric(polio.m$value) #將value欄位轉為數字
polio.sumYear<- #各州各年度加總,計算該年度的總發生率
    aggregate(value~YEAR+variable,data=polio.m,FUN=sum,na.rm=F)
head(polio.sumYear)
##   YEAR variable value
## 1 1928  ALABAMA  2.39
## 2 1929  ALABAMA  2.25
## 3 1930  ALABAMA  2.57
## 4 1931  ALABAMA  2.07
## 5 1932  ALABAMA  1.38
## 6 1933  ALABAMA  1.12

Homework 6 -錯誤示範,不要這樣畫!

ggplot(polio.sumYear)+ #資料為polio.sumYear
    geom_line(aes(x=YEAR,y=value,color=variable))+ #geom_line:畫折線圖
    geom_vline(xintercept = 1955,colour="black", linetype = "longdash")#疫苗

Error bar 誤差線

  • 終於要扯一點所謂的統計
  • 通常使用在bar chart, line chart
  • 比較多組資料的平均值高低時,務必加上Error bar 誤差線
  • 誤差線計算
    • Standard deviation (SD) 標準差:呈現資料本質時使用
    • Standard error (SE) 標準誤差:呈現預估平均值的可能誤差時使用
    • Confidence interval (CI) 信賴區間:呈現預估平均值的信心時使用

Bar Chart without Error Bar in ggplot2

library(datasets)
airquality$Month<-as.factor(airquality$Month) #將Month轉為因子變項
airquality.mean<-aggregate(Ozone~Month,airquality,mean) #計算每月Ozone平均mean
ggplot()+geom_bar(data=airquality.mean,aes(x=Month,y=Ozone),
                  stat = "identity") #stat = "identity" 直接畫數字

Bar Chart with Error Bar in ggplot2

airquality.sd<-aggregate(Ozone~Month,airquality,sd) #計算每月Ozone標準差sd
airquality.eb<-merge(airquality.mean,airquality.sd,by="Month")
ggplot(data=airquality.eb)+ #資料airquality.eb
    geom_bar(aes(x=Month,y=Ozone.x),stat = "identity")+
    geom_errorbar( #ymin低點, ymax高點, Ozone.x=mean, Ozone.y=sd標準差
        aes(x=Month,ymin=Ozone.x-Ozone.y,ymax=Ozone.x+Ozone.y), width=.1)

Clustering

汽車基本資料

mtcars.mx<-as.matrix(mtcars)
mtcars.mxs<-scale(mtcars.mx)
#[, 1]  mpg Miles/(US) gallon
#[, 2]  cyl Number of cylinders 氣缸數
#[, 3]  disp    Displacement (cu.in.) 排氣量
#[, 4]  hp  Gross horsepower 馬力
#[, 5]  drat    Rear axle ratio 後輪軸比
#[, 6]  wt  Weight (1000 lbs)
#[, 7]  qsec    1/4 mile time
#[, 8]  vs  V/S
#[, 9]  am  Transmission (0 = automatic, 1 = manual)
#[,10]  gear    Number of forward gears 前進檔
#[,11]  carb    Number of carburetors 化油器
head(mtcars.mxs,3)
##                     mpg        cyl       disp         hp      drat
## Mazda RX4     0.1508848 -0.1049878 -0.5706198 -0.5350928 0.5675137
## Mazda RX4 Wag 0.1508848 -0.1049878 -0.5706198 -0.5350928 0.5675137
## Datsun 710    0.4495434 -1.2248578 -0.9901821 -0.7830405 0.4739996
##                       wt       qsec         vs       am      gear
## Mazda RX4     -0.6103996 -0.7771651 -0.8680278 1.189901 0.4235542
## Mazda RX4 Wag -0.3497853 -0.4637808 -0.8680278 1.189901 0.4235542
## Datsun 710    -0.9170046  0.4260068  1.1160357 1.189901 0.4235542
##                     carb
## Mazda RX4      0.7352031
## Mazda RX4 Wag  0.7352031
## Datsun 710    -1.1221521

各汽車/參數間的相似性? -heatmap()

par(mar=rep(0.2,4),mfrow=c(1,1))
heatmap(mtcars.mxs)

如何找到相似的物件/事件?

Clustering organizes things that are close into groups

  • How do we define close?
  • How do we group things?
  • How do we visualize the grouping?
  • How do we interpret the grouping?

Hierarchical clustering

  • An agglomerative approach
    • Find closest two things
    • Put them together
    • Find next closest
  • Requires
    • A defined distance
    • A merging approach
  • Produces
    • A tree showing how close things are to each other

Hierarchical clustering

  • An agglomerative approach
    • Find closest two things
    • Put them together
    • Find next closest
  • Requires
    • A defined distance
    • A merging approach
  • Produces
    • A tree showing how close things are to each other

How do we define close? distance

  • Most important step
    • Garbage in -> garbage out
  • Distance or similarity
    • Continuous - euclidean distance
    • Continuous - correlation similarity
    • Binary - manhattan distance
  • Pick a distance/similarity that makes sense for your problem

Example distances - Euclidean

\[\sqrt{(A_1-A_2)^2 + (B_1-B_2)^2 + \ldots + (Z_1-Z_2)^2}\]

Example distances - Manhattan

\[|A_1-A_2| + |B_1-B_2| + \ldots + |Z_1-Z_2|\]

Green line: Euclidean, Blue line: Manhattan

Hierarchical clustering

  • An agglomerative approach
    • Find closest two things
    • Put them together
    • Find next closest
  • Requires
    • A defined distance
    • A merging approach
  • Produces
    • A tree showing how close things are to each other

Merging apporach

  • Agglomerative 聚合
    • Single-linkage:取最小值
    • Complete-linkage:取最大值
    • Average-linkage:取平均值

Hierarchical clustering - hp vs. mpg

Hierarchical clustering - #1

Hierarchical clustering - #2

Hierarchical clustering - #3

Hierarchical clustering

  • An agglomerative approach
    • Find closest two things
    • Put them together
    • Find next closest
  • Requires
    • A defined distance
    • A merging approach
  • Produces
    • A tree showing how close things are to each other

Hierarchical Clustering -dist()

dist()函數計算距離,使用method=""設定計算距離的依據

d<-dist(mtcars.mxs) #預設為euclidean
d
##                     Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
## Mazda RX4 Wag       0.4075899                                        
## Datsun 710          3.2430644     3.1763654                          
## Hornet 4 Drive      4.4013651     4.2633265  3.4371367               
## Hornet Sportabout   3.8803542     3.8196912  5.0032747      3.0421632
## Valiant             4.8437395     4.6756447  3.8681280      0.9936969
## Duster 360          4.1895788     4.1749365  5.8959064      4.3395668
## Merc 240D           3.9972560     3.8208496  2.5014249      2.5336229
## Merc 230            4.9177375     4.6700230  3.3122031      3.2698916
## Merc 280            3.1377712     2.9882339  3.2950024      2.9859746
## Merc 280C           3.2928005     3.1170530  3.3443599      2.9705073
## Merc 450SE          3.8563035     3.7329721  5.1667877      3.2468885
## Merc 450SL          3.7264672     3.6140741  5.0139378      3.0963299
## Merc 450SLC         3.8587627     3.7280140  5.0836872      3.1350053
## Cadillac Fleetwood  5.4495167     5.2848094  6.7701575      4.6845022
## Lincoln Continental 5.4799639     5.3127593  6.8196686      4.7868961
## Chrysler Imperial   5.0972757     4.9355861  6.5250751      4.6034009
## Fiat 128            4.0243306     3.9407293  1.7832069      4.1853934
## Honda Civic         4.0533412     4.0507443  2.6458745      5.2450434
## Toyota Corolla      4.3445000     4.2722835  2.2120337      4.5513874
## Toyota Corona       4.3303364     4.2137101  2.5743286      2.1359358
## Dodge Challenger    4.1089579     4.0529524  5.1794615      3.1033485
## AMC Javelin         3.7602309     3.6846550  4.8499144      2.9346504
## Camaro Z28          4.1191859     4.1051443  5.9277646      4.6740373
## Pontiac Firebird    4.1721471     4.0882070  5.2718706      3.1815173
## Fiat X1-9           3.6110861     3.5658305  1.0646650      3.9829642
## Porsche 914-2       2.5948592     2.6591243  2.9668981      5.3633226
## Lotus Europa        3.5593956     3.6429111  2.3504814      4.8654619
## Ford Pantera L      3.6239136     3.7004541  5.6724603      6.2182516
## Ferrari Dino        2.2173337     2.3107383  4.6711840      5.7255126
## Maserati Bora       4.9757567     5.0070559  7.2893098      7.4540115
## Volvo 142E          2.9056273     2.7880036  0.9799181      3.5836208
##                     Hornet Sportabout   Valiant Duster 360 Merc 240D
## Mazda RX4 Wag                                                       
## Datsun 710                                                          
## Hornet 4 Drive                                                      
## Hornet Sportabout                                                   
## Valiant                     3.3988179                               
## Duster 360                  1.8907332 4.5959496                     
## Merc 240D                   4.6074904 2.9114013  5.6208325          
## Merc 230                    5.3601882 3.4090956  6.3074270 1.7677287
## Merc 280                    3.7603055 3.3860393  4.1159325 2.3467260
## Merc 280C                   3.8404534 3.2902323  4.2017096 2.3314789
## Merc 450SE                  1.2157212 3.3739087  1.7195474 4.6100696
## Merc 450SL                  1.0581618 3.2662012  1.7483591 4.4789673
## Merc 450SLC                 1.2772271 3.2103732  1.8280213 4.5289442
## Cadillac Fleetwood          2.8840440 4.6250214  2.4964362 6.0571351
## Lincoln Continental         2.9684612 4.7409039  2.4967056 6.1200901
## Chrysler Imperial           2.6262007 4.7149888  2.1314298 5.8475038
## Fiat 128                    5.8776367 4.7310845  6.9265472 2.8477338
## Honda Civic                 6.5054903 5.8852758  7.2973276 3.7432868
## Toyota Corolla              6.2330226 5.1096664  7.2823316 3.1877164
## Toyota Corona               4.4095161 2.5379207  5.4586640 1.8187717
## Dodge Challenger            1.0370480 3.2050198  2.1565615 4.7681748
## AMC Javelin                 0.8360206 3.1377397  2.0925400 4.4434196
## Camaro Z28                  2.3538211 5.0148655  1.0554323 5.7074187
## Pontiac Firebird            0.5475333 3.5076094  1.9842791 4.7856750
## Fiat X1-9                   5.6028144 4.4964851  6.5754940 2.7365898
## Porsche 914-2               5.5333206 5.9048663  6.1678160 4.0696390
## Lotus Europa                5.8906362 5.4091213  6.5673285 3.6797302
## Ford Pantera L              4.5704575 6.6597531  3.9756842 6.3426475
## Ferrari Dino                4.8737371 6.0393960  4.5054188 5.2004290
## Maserati Bora               5.7962447 7.6982401  4.5080940 7.6028670
## Volvo 142E                  4.9104611 4.0221723  5.5902795 2.4866260
##                      Merc 230  Merc 280 Merc 280C Merc 450SE Merc 450SL
## Mazda RX4 Wag                                                          
## Datsun 710                                                             
## Hornet 4 Drive                                                         
## Hornet Sportabout                                                      
## Valiant                                                                
## Duster 360                                                             
## Merc 240D                                                              
## Merc 230                                                               
## Merc 280            3.1736664                                          
## Merc 280C           2.9644001 0.4082884                                
## Merc 450SE          5.2829002 3.5036558 3.5446708                      
## Merc 450SL          5.1385834 3.4252318 3.4663403  0.3944266           
## Merc 450SLC         5.0939035 3.4613499 3.4571735  0.4901305  0.4172832
## Cadillac Fleetwood  6.4719931 4.8481443 4.8076108  2.3752167  2.6191633
## Lincoln Continental 6.5342344 4.8652211 4.8310582  2.4147746  2.6820009
## Chrysler Imperial   6.3195516 4.5331791 4.5497518  2.1301591  2.3833766
## Fiat 128            3.4760961 4.1044898 4.1942047  6.1027421  5.9176565
## Honda Civic         4.2793565 4.3521695 4.4596794  6.7429222  6.5608536
## Toyota Corolla      3.6385061 4.4495050 4.5273217  6.4894198  6.2849281
## Toyota Corona       2.3602221 3.0091888 2.9591531  4.5230838  4.3467458
## Dodge Challenger    5.5840526 3.9899083 4.0420607  1.2054395  1.1528213
## AMC Javelin         5.1708093 3.5713821 3.6040168  1.0549265  0.9430857
## Camaro Z28          6.3746076 4.0282873 4.1261450  2.1735238  2.2625903
## Pontiac Firebird    5.5292039 3.9725287 4.0518437  1.3039453  1.2653565
## Fiat X1-9           3.4679554 3.7810906 3.8551580  5.8249673  5.6428137
## Porsche 914-2       4.9107991 4.2059850 4.3568869  5.7510348  5.6099124
## Lotus Europa        4.6690572 4.1265650 4.3109976  6.1480650  5.9587741
## Ford Pantera L      7.0037774 4.7638360 4.9017138  4.6618197  4.6520853
## Ferrari Dino        6.0347907 3.8958003 4.0539793  4.6304461  4.5549676
## Maserati Bora       8.1527622 5.7612264 5.8672946  5.4444617  5.4646796
## Volvo 142E          3.1932948 2.7801612 2.8200779  4.9479927  4.8294785
##                     Merc 450SLC Cadillac Fleetwood Lincoln Continental
## Mazda RX4 Wag                                                         
## Datsun 710                                                            
## Hornet 4 Drive                                                        
## Hornet Sportabout                                                     
## Valiant                                                               
## Duster 360                                                            
## Merc 240D                                                             
## Merc 230                                                              
## Merc 280                                                              
## Merc 280C                                                             
## Merc 450SE                                                            
## Merc 450SL                                                            
## Merc 450SLC                                                           
## Cadillac Fleetwood    2.4458852                                       
## Lincoln Continental   2.5174149          0.2956825                    
## Chrysler Imperial     2.3303195          1.0635310           0.9080748
## Fiat 128              6.0522758          7.7973439           7.8458703
## Honda Civic           6.6876038          8.3928547           8.4234897
## Toyota Corolla        6.4199329          8.2005281           8.2527181
## Toyota Corona         4.3683285          6.1484614           6.2160567
## Dodge Challenger      1.2246720          2.8372165           2.9549058
## AMC Javelin           0.9626504          2.8988638           2.9910214
## Camaro Z28            2.3169202          2.8150659           2.7412463
## Pontiac Firebird      1.4562704          2.5701272           2.6555931
## Fiat X1-9             5.7474798          7.5268299           7.5797064
## Porsche 914-2         5.7482820          7.3428177           7.3625764
## Lotus Europa          6.1321085          7.8314188           7.8778088
## Ford Pantera L        4.7410691          5.2908702           5.2360533
## Ferrari Dino          4.6718041          5.8052611           5.8069129
## Maserati Bora         5.5262308          5.6040962           5.5359080
## Volvo 142E            4.8806503          6.4042902           6.4299677
##                     Chrysler Imperial  Fiat 128 Honda Civic Toyota Corolla
## Mazda RX4 Wag                                                             
## Datsun 710                                                                
## Hornet 4 Drive                                                            
## Hornet Sportabout                                                         
## Valiant                                                                   
## Duster 360                                                                
## Merc 240D                                                                 
## Merc 230                                                                  
## Merc 280                                                                  
## Merc 280C                                                                 
## Merc 450SE                                                                
## Merc 450SL                                                                
## Merc 450SLC                                                               
## Cadillac Fleetwood                                                        
## Lincoln Continental                                                       
## Chrysler Imperial                                                         
## Fiat 128                    7.4347239                                     
## Honda Civic                 7.9702830 1.9243376                           
## Toyota Corolla              7.8280569 0.5757917   1.7799297               
## Toyota Corona               5.9697480 3.1795988   3.9646286      3.4386884
## Dodge Challenger            2.8683778 6.1897572   6.9208270      6.5812246
## AMC Javelin                 2.8183456 5.8508670   6.4507155      6.2193563
## Camaro Z28                  2.2659922 6.9503062   7.1027162      7.2970631
## Pontiac Firebird            2.3043685 6.1149641   6.7946533      6.4857014
## Fiat X1-9                   7.2243403 0.9440123   1.8356814      1.2631917
## Porsche 914-2               6.9369982 3.2195133   2.9932093      3.4109616
## Lotus Europa                7.4655971 2.3907048   2.8622875      2.5923660
## Ford Pantera L              4.8165342 6.6166138   6.4835328      6.9120796
## Ferrari Dino                5.4553770 5.4662206   5.4859770      5.7728984
## Maserati Bora               5.2173124 8.1807755   8.1867266      8.4801671
## Volvo 142E                  6.1099484 2.1944946   2.6127269      2.6006079
##                     Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
## Mazda RX4 Wag                                                            
## Datsun 710                                                               
## Hornet 4 Drive                                                           
## Hornet Sportabout                                                        
## Valiant                                                                  
## Duster 360                                                               
## Merc 240D                                                                
## Merc 230                                                                 
## Merc 280                                                                 
## Merc 280C                                                                
## Merc 450SE                                                               
## Merc 450SL                                                               
## Merc 450SLC                                                              
## Cadillac Fleetwood                                                       
## Lincoln Continental                                                      
## Chrysler Imperial                                                        
## Fiat 128                                                                 
## Honda Civic                                                              
## Toyota Corolla                                                           
## Toyota Corona                                                            
## Dodge Challenger        4.5624114                                        
## AMC Javelin             4.1915369        0.7827694                       
## Camaro Z28              5.5640650        2.7782606   2.4813171           
## Pontiac Firebird        4.6855121        1.1942533   1.1771569  2.4529311
## Fiat X1-9               2.8771692        5.8525045   5.4949181  6.5789332
## Porsche 914-2           4.6534714        5.8698053   5.4885985  6.0073280
## Lotus Europa            4.2445533        6.1425433   5.9214212  6.6478423
## Ford Pantera L          6.7110287        5.0078711   4.7275036  3.6306118
## Ferrari Dino            5.8353128        5.0080706   4.8308278  4.5053590
## Maserati Bora           8.0606070        6.0178217   5.9396759  4.4569171
## Volvo 142E              2.7534994        5.1510061   4.7542900  5.5099681
##                     Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
## Mazda RX4 Wag                                                            
## Datsun 710                                                               
## Hornet 4 Drive                                                           
## Hornet Sportabout                                                        
## Valiant                                                                  
## Duster 360                                                               
## Merc 240D                                                                
## Merc 230                                                                 
## Merc 280                                                                 
## Merc 280C                                                                
## Merc 450SE                                                               
## Merc 450SL                                                               
## Merc 450SLC                                                              
## Cadillac Fleetwood                                                       
## Lincoln Continental                                                      
## Chrysler Imperial                                                        
## Fiat 128                                                                 
## Honda Civic                                                              
## Toyota Corolla                                                           
## Toyota Corona                                                            
## Dodge Challenger                                                         
## AMC Javelin                                                              
## Camaro Z28                                                               
## Pontiac Firebird                                                         
## Fiat X1-9                  5.8831504                                     
## Porsche 914-2              5.8049597 2.9043943                           
## Lotus Europa               6.1638574 2.1786546     2.5613776             
## Ford Pantera L             4.7026976 6.2258629     4.6929233    5.5540990
## Ferrari Dino               5.0961648 5.1088276     3.6305579    4.2187875
## Maserati Bora              5.8678663 7.8702812     6.5745699    6.9847187
## Volvo 142E                 5.1557398 1.6207836     2.8881825    2.6646220
##                     Ford Pantera L Ferrari Dino Maserati Bora
## Mazda RX4 Wag                                                
## Datsun 710                                                   
## Hornet 4 Drive                                               
## Hornet Sportabout                                            
## Valiant                                                      
## Duster 360                                                   
## Merc 240D                                                    
## Merc 230                                                     
## Merc 280                                                     
## Merc 280C                                                    
## Merc 450SE                                                   
## Merc 450SL                                                   
## Merc 450SLC                                                  
## Cadillac Fleetwood                                           
## Lincoln Continental                                          
## Chrysler Imperial                                            
## Fiat 128                                                     
## Honda Civic                                                  
## Toyota Corolla                                               
## Toyota Corona                                                
## Dodge Challenger                                             
## AMC Javelin                                                  
## Camaro Z28                                                   
## Pontiac Firebird                                             
## Fiat X1-9                                                    
## Porsche 914-2                                                
## Lotus Europa                                                 
## Ford Pantera L                                               
## Ferrari Dino             3.0648409                           
## Maserati Bora            3.0287549    3.3719604              
## Volvo 142E               5.2160045    4.2489749     6.7446456

Hierarchical Clustering -dist()

dist()函數計算距離,使用method=""設定計算距離的依據,可用方法包括: "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski"

d<-dist(mtcars.mxs, method="manhattan") #計算manhattan距離
d
##                      Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
## Mazda RX4 Wag        0.5739986                                        
## Datsun 710           7.5307482     7.4779782                          
## Hornet 4 Drive      11.8673022    11.2933035  8.9891038               
## Hornet Sportabout   11.2150297    10.6410310 15.4555075      7.6572038
## Valiant             13.3741549    12.8001562 10.4828422      2.1721112
## Duster 360          11.7820302    11.8348003 19.1257503     11.5518804
## Merc 240D           10.6740916    10.1000930  5.6192926      6.6700449
## Merc 230            11.2015603    10.6275616  6.2969217      8.0647933
## Merc 280             6.4428451     5.8688464  7.9460836      6.9357072
## Merc 280C            7.0109040     6.4369054  8.1671813      6.8322283
## Merc 450SE          11.3772351    10.8032364 16.4306424      8.3704992
## Merc 450SL          10.9923430    10.4183444 15.8219044      7.7617612
## Merc 450SLC         11.6157249    11.0417262 15.9975943      7.9374512
## Cadillac Fleetwood  15.4937110    14.9197124 21.1362018     13.0760587
## Lincoln Continental 15.5001133    14.9261147 21.3216809     13.2615377
## Chrysler Imperial   14.1093081    13.5353095 20.3785676     12.8795090
## Fiat 128            10.6008480    10.5480780  3.2571280     11.3172712
## Honda Civic         11.5342698    11.4814998  5.5295172     14.5186211
## Toyota Corolla      11.8011460    11.7483760  4.4574260     12.5175692
## Toyota Corona       11.4343094    11.3815393  4.9432809      4.6837839
## Dodge Challenger    11.7697013    11.3635872 16.1780636      8.1179205
## AMC Javelin         11.1308717    10.5568731 15.0579652      7.2596615
## Camaro Z28          11.4112990    11.4640690 18.7550190     13.1262424
## Pontiac Firebird    12.0164341    11.4424355 16.2233351      8.1631919
## Fiat X1-9            9.7040821     9.6513120  2.3603620     11.0248892
## Porsche 914-2        6.7566501     7.0620336  6.9555078     15.7461264
## Lotus Europa         9.4453466     9.5268842  5.5626132     13.7565767
## Ford Pantera L       9.3826778     9.4354478 16.9134260     18.5443344
## Ferrari Dino         5.0925824     5.3599760 12.1942473     15.6059216
## Maserati Bora       13.0516657    13.1044357 20.3953857     20.7747378
## Volvo 142E           6.4916534     6.1118380  2.1516587      9.0594209
##                     Hornet Sportabout    Valiant Duster 360  Merc 240D
## Mazda RX4 Wag                                                         
## Datsun 710                                                            
## Hornet 4 Drive                                                        
## Hornet Sportabout                                                     
## Valiant                     8.4734316                                 
## Duster 360                  3.8946766 12.1281218                      
## Merc 240D                  12.8271725  7.5370146 16.4974152           
## Merc 230                   14.2219209  8.6855324 17.8921636  2.8883172
## Merc 280                   10.2477369  7.3853351 11.4415061  5.9162388
## Merc 280C                  10.6498744  6.9168290 11.5449850  5.8127599
## Merc 450SE                  2.7591785  8.9093349  4.2408037 13.8023074
## Merc 450SL                  2.3742864  8.3005969  4.1545702 13.1935693
## Merc 450SLC                 2.9976683  8.4762869  4.0810820 13.3692593
## Cadillac Fleetwood          6.7551549 13.0912154  5.5724155 18.5078668
## Lincoln Continental         6.7615572 13.5385340  5.2871147 18.6933458
## Chrysler Imperial           5.6699971 13.4557504  3.6663034 17.7502326
## Fiat 128                   18.7126355 12.7774326 22.3828783  6.5953360
## Honda Civic                19.6460573 16.0123594 23.3163001  8.4753448
## Toyota Corolla             19.9129335 13.4964618 23.5831763  7.2851949
## Toyota Corona              12.0791483  5.5395613 15.7493910  3.9460278
## Dodge Challenger            2.1295683  7.4971814  4.6309404 13.5497286
## AMC Javelin                 1.5589925  7.8870037  4.2922189 12.4296302
## Camaro Z28                  5.6304087 13.7024837  1.7357321 16.2763065
## Pontiac Firebird            0.9673259  9.1044605  4.5962782 13.5950001
## Fiat X1-9                  17.8158695 12.5186275 21.4861123  6.3365310
## Porsche 914-2              15.2265911 17.2398647 18.5386803 10.5487891
## Lotus Europa               17.1176574 15.3961666 20.6535926  9.1134753
## Ford Pantera L             11.4922713 19.6213640  8.3610834 18.9325509
## Ferrari Dino               12.6261833 16.6813785 12.4992614 14.4599283
## Maserati Bora              14.0696176 21.3509791 10.4072310 21.7751397
## Volvo 142E                 14.2763955 10.6698404 17.9466382  5.3826778
##                       Merc 230   Merc 280  Merc 280C Merc 450SE Merc 450SL
## Mazda RX4 Wag                                                             
## Datsun 710                                                                
## Hornet 4 Drive                                                            
## Hornet Sportabout                                                         
## Valiant                                                                   
## Duster 360                                                                
## Merc 240D                                                                 
## Merc 230                                                                  
## Merc 280             6.4506575                                            
## Merc 280C            6.3471786  0.5680590                                 
## Merc 450SE          15.1970558  9.9846350 10.0881139                      
## Merc 450SL          14.5883177  9.3758970  9.4793759  0.6087380           
## Merc 450SLC         14.7640077  9.5515870  9.6550658  0.8312596  0.6233818
## Cadillac Fleetwood  19.9026152 13.4519577 13.5554366  5.3547127  5.7396048
## Lincoln Continental 20.0880943 13.6374368 13.7409156  5.3611150  5.7460071
## Chrysler Imperial   19.1449810 12.6943235 12.7978023  4.5688000  5.1551534
## Fiat 128             8.3296726 10.9413721 10.8378933 19.6877704 19.0790324
## Honda Civic         10.3263627 11.8747939 12.1966223 20.6211922 20.0124542
## Toyota Corolla       9.0487018 12.1416701 12.0381913 20.8880684 20.2793304
## Toyota Corona        5.1152086  7.8415557  7.7380769 13.0542832 12.4455451
## Dodge Challenger    14.9444770 10.9702930 11.0737719  2.9849858  2.8987524
## AMC Javelin         13.8243786  9.8604148  9.9638937  2.3378753  2.2516419
## Camaro Z28          17.5214324 11.0707749 11.1742537  5.2632766  5.4018868
## Pontiac Firebird    14.9897485 11.0155645 11.5836235  2.6032544  2.4534262
## Fiat X1-9            8.0708676 10.0446062  9.9411273 18.7910044 18.1822664
## Porsche 914-2       11.5538680 12.8560687 13.4241277 16.6270333 16.2421412
## Lotus Europa        10.5629575 11.3400927 11.9081516 18.5180996 18.1332075
## Ford Pantera L      19.4260889 13.5273205 13.6307994 11.9588896 11.8726562
## Ferrari Dino        15.7454824 10.4182126 10.9862716 12.7883887 12.4034966
## Maserati Bora       23.1698881 16.7192306 16.8227095 13.4636610 13.3774276
## Volvo 142E           5.7401268  6.5051321  6.7374221 15.2515303 14.6427923
##                     Merc 450SLC Cadillac Fleetwood Lincoln Continental
## Mazda RX4 Wag                                                         
## Datsun 710                                                            
## Hornet 4 Drive                                                        
## Hornet Sportabout                                                     
## Valiant                                                               
## Duster 360                                                            
## Merc 240D                                                             
## Merc 230                                                              
## Merc 280                                                              
## Merc 280C                                                             
## Merc 450SE                                                            
## Merc 450SL                                                            
## Merc 450SLC                                                           
## Cadillac Fleetwood    5.1386075                                       
## Lincoln Continental   5.3240866          0.6409627                    
## Chrysler Imperial     4.9794635          2.3078438           1.8283598
## Fiat 128             19.2547223         24.3933298          24.5788089
## Honda Civic          20.1881441         25.3267516          25.5122307
## Toyota Corolla       20.4550203         25.5936278          25.7791069
## Toyota Corona        12.6212351         17.7598426          17.9453217
## Dodge Challenger      2.9248170          6.8363791           7.1046208
## AMC Javelin           2.1781536          6.8393128           6.8457151
## Camaro Z28            5.2261969          6.4244156           6.1391148
## Pontiac Firebird      2.9746064          5.9537504           5.9601527
## Fiat X1-9            18.3579564         23.4965639          23.6820429
## Porsche 914-2        16.8655230         21.9817460          21.9881482
## Lotus Europa         18.7565894         23.8728123          23.8792146
## Ford Pantera L       11.9982737         13.9334989          13.6481981
## Ferrari Dino         13.0268785         16.9048646          16.9112669
## Maserati Bora        13.3039393         15.9796465          15.6943457
## Volvo 142E           14.8184823         19.9570898          20.1425688
##                     Chrysler Imperial   Fiat 128 Honda Civic
## Mazda RX4 Wag                                               
## Datsun 710                                                  
## Hornet 4 Drive                                              
## Hornet Sportabout                                           
## Valiant                                                     
## Duster 360                                                  
## Merc 240D                                                   
## Merc 230                                                    
## Merc 280                                                    
## Merc 280C                                                   
## Merc 450SE                                                  
## Merc 450SL                                                  
## Merc 450SLC                                                 
## Cadillac Fleetwood                                          
## Lincoln Continental                                         
## Chrysler Imperial                                           
## Fiat 128                   23.6356956                       
## Honda Civic                24.5691174  3.8986126            
## Toyota Corolla             24.8359936  1.2002980   3.7515780
## Toyota Corona              17.0022084  7.2378713  10.4727981
## Dodge Challenger            6.5741454 19.4351916  20.3686134
## AMC Javelin                 5.7541551 18.3150932  19.2485150
## Camaro Z28                  4.7753351 22.0121470  22.9455688
## Pontiac Firebird            5.1304321 19.4804631  20.4138849
## Fiat X1-9                  22.7389296  1.4384349   3.4937318
## Porsche 914-2              20.5973431  7.9867841   7.4883665
## Lotus Europa               22.4884094  5.8442836   5.5819457
## Ford Pantera L             11.8198383 19.3102243  19.7199672
## Ferrari Dino               15.5204617 15.4513753  16.3847971
## Maserati Bora              13.8659859 23.6525137  24.5859355
## Volvo 142E                 19.1994555  4.5484570   5.4592002
##                     Toyota Corolla Toyota Corona Dodge Challenger
## Mazda RX4 Wag                                                    
## Datsun 710                                                       
## Hornet 4 Drive                                                   
## Hornet Sportabout                                                
## Valiant                                                          
## Duster 360                                                       
## Merc 240D                                                        
## Merc 230                                                         
## Merc 280                                                         
## Merc 280C                                                        
## Merc 450SE                                                       
## Merc 450SL                                                       
## Merc 450SLC                                                      
## Cadillac Fleetwood                                               
## Lincoln Continental                                              
## Chrysler Imperial                                                
## Fiat 128                                                         
## Honda Civic                                                      
## Toyota Corolla                                                   
## Toyota Corona            7.9569005                               
## Dodge Challenger        20.6354896    12.8017044                 
## AMC Javelin             19.5153912    11.6816060        1.2196513
## Camaro Z28              23.2124450    15.4908767        6.2053024
## Pontiac Firebird        20.6807611    12.8469759        2.6715321
## Fiat X1-9                2.0970640     6.9790663       18.5384257
## Porsche 914-2            8.5407612    11.7035309       15.7812627
## Lotus Europa             6.2985093    10.0899677       17.7059060
## Ford Pantera L          19.9868434    20.8169872       12.3463775
## Ferrari Dino            16.6516733    15.5367238       13.9101127
## Maserati Bora           24.8528117    23.7378623       14.1281269
## Volvo 142E               5.6365381     6.0552198       14.9989516
##                     AMC Javelin Camaro Z28 Pontiac Firebird  Fiat X1-9
## Mazda RX4 Wag                                                         
## Datsun 710                                                            
## Hornet 4 Drive                                                        
## Hornet Sportabout                                                     
## Valiant                                                               
## Duster 360                                                            
## Merc 240D                                                             
## Merc 230                                                              
## Merc 280                                                              
## Merc 280C                                                             
## Merc 450SE                                                            
## Merc 450SL                                                            
## Merc 450SLC                                                           
## Cadillac Fleetwood                                                    
## Lincoln Continental                                                   
## Chrysler Imperial                                                     
## Fiat 128                                                              
## Honda Civic                                                           
## Toyota Corolla                                                        
## Toyota Corona                                                         
## Dodge Challenger                                                      
## AMC Javelin                                                           
## Camaro Z28            5.8665809                                       
## Pontiac Firebird      2.4927415  5.7801212                            
## Fiat X1-9            17.4183272 21.1153810       18.5836971           
## Porsche 914-2        15.1424331 18.1679491       16.0279956  6.9673761
## Lotus Europa         17.0334994 20.2828613       17.9190619  5.4345617
## Ford Pantera L       11.9334660  7.5252203       12.4595972 18.4134583
## Ferrari Dino         13.2712830 12.1594541       13.4275878 14.5546093
## Maserati Bora        13.5634873 10.2659381       14.7712191 22.7557478
## Volvo 142E           13.8788532 17.5759070       15.0442231  3.6516910
##                     Porsche 914-2 Lotus Europa Ford Pantera L Ferrari Dino
## Mazda RX4 Wag                                                             
## Datsun 710                                                                
## Hornet 4 Drive                                                            
## Hornet Sportabout                                                         
## Valiant                                                                   
## Duster 360                                                                
## Merc 240D                                                                 
## Merc 230                                                                  
## Merc 280                                                                  
## Merc 280C                                                                 
## Merc 450SE                                                                
## Merc 450SL                                                                
## Merc 450SLC                                                               
## Cadillac Fleetwood                                                        
## Lincoln Continental                                                       
## Chrysler Imperial                                                         
## Fiat 128                                                                  
## Honda Civic                                                               
## Toyota Corolla                                                            
## Toyota Corona                                                             
## Dodge Challenger                                                          
## AMC Javelin                                                               
## Camaro Z28                                                                
## Pontiac Firebird                                                          
## Fiat X1-9                                                                 
## Porsche 914-2                                                             
## Lotus Europa            5.2254313                                         
## Ford Pantera L         12.2316006   16.0297667                            
## Ferrari Dino            8.8964305   11.0113428      8.0559820             
## Maserati Bora          17.0975690   19.2124813      5.7847419    8.2011385
## Volvo 142E              6.6867056    5.9981179     14.7617674   11.0355756
##                     Maserati Bora
## Mazda RX4 Wag                    
## Datsun 710                       
## Hornet 4 Drive                   
## Hornet Sportabout                
## Valiant                          
## Duster 360                       
## Merc 240D                        
## Merc 230                         
## Merc 280                         
## Merc 280C                        
## Merc 450SE                       
## Merc 450SL                       
## Merc 450SLC                      
## Cadillac Fleetwood               
## Lincoln Continental              
## Chrysler Imperial                
## Fiat 128                         
## Honda Civic                      
## Toyota Corolla                   
## Toyota Corona                    
## Dodge Challenger                 
## AMC Javelin                      
## Camaro Z28                       
## Pontiac Firebird                 
## Fiat X1-9                        
## Porsche 914-2                    
## Lotus Europa                     
## Ford Pantera L                   
## Ferrari Dino                     
## Maserati Bora                    
## Volvo 142E             19.2162737

Hierarchical Clustering -hclust()

hclust函數畫圖,必要參數是個觀察職的距離(可用dist()函數計算)

par(mar=rep(2,4),mfrow=c(1,1))
hc<-hclust(dist(mtcars.mxs)) #可用method參數設定聚合方法,預設為complete
plot(hc)

Hierarchical Clustering -hclust()

hclust函數畫圖,必要參數是個觀察職的距離(可用dist()函數計算)

par(mar=rep(2,4),mfrow=c(1,1))
hc<-hclust(dist(mtcars.mxs),method="average") #聚合方法為計算平均距離
plot(hc)

Hierarchical Clustering -cutree()

clusterCut <- cutree(hc, k=5) #分5群
sort(clusterCut)
##           Mazda RX4       Mazda RX4 Wag       Porsche 914-2 
##                   1                   1                   1 
##        Ferrari Dino          Datsun 710            Fiat 128 
##                   1                   2                   2 
##         Honda Civic      Toyota Corolla           Fiat X1-9 
##                   2                   2                   2 
##        Lotus Europa          Volvo 142E      Hornet 4 Drive 
##                   2                   2                   3 
##             Valiant           Merc 240D            Merc 230 
##                   3                   3                   3 
##            Merc 280           Merc 280C       Toyota Corona 
##                   3                   3                   3 
##   Hornet Sportabout          Duster 360          Merc 450SE 
##                   4                   4                   4 
##          Merc 450SL         Merc 450SLC  Cadillac Fleetwood 
##                   4                   4                   4 
## Lincoln Continental   Chrysler Imperial    Dodge Challenger 
##                   4                   4                   4 
##         AMC Javelin          Camaro Z28    Pontiac Firebird 
##                   4                   4                   4 
##      Ford Pantera L       Maserati Bora 
##                   5                   5

HC- clusters & variables

ggplot()+geom_point(data=mtcars,
                    aes(x=hp,y=mpg,color=as.factor(clusterCut)))

Hierarchical Clustering -cutree(),2

clusterCut <- cutree(hc,h =4) #切在高度=4的地方(距離=4)
sort(clusterCut)
##           Mazda RX4       Mazda RX4 Wag          Datsun 710 
##                   1                   1                   1 
##            Fiat 128         Honda Civic      Toyota Corolla 
##                   1                   1                   1 
##           Fiat X1-9       Porsche 914-2        Lotus Europa 
##                   1                   1                   1 
##        Ferrari Dino          Volvo 142E      Hornet 4 Drive 
##                   1                   1                   2 
##             Valiant           Merc 240D            Merc 230 
##                   2                   2                   2 
##            Merc 280           Merc 280C       Toyota Corona 
##                   2                   2                   2 
##   Hornet Sportabout          Duster 360          Merc 450SE 
##                   3                   3                   3 
##          Merc 450SL         Merc 450SLC  Cadillac Fleetwood 
##                   3                   3                   3 
## Lincoln Continental   Chrysler Imperial    Dodge Challenger 
##                   3                   3                   3 
##         AMC Javelin          Camaro Z28    Pontiac Firebird 
##                   3                   3                   3 
##      Ford Pantera L       Maserati Bora 
##                   4                   4

Cluster the data -heatmap(),2

par(mar=rep(0.2,4),mfrow=c(1,1))
heatmap(mtcars.mxs)

Hierarchical clustering - hclust

distxy <- dist(mtcars.mxs)
hClustering <- hclust(distxy)
plot(hClustering)

Hierarchical clustering: summary

  • 可快速瀏覽觀察值與各欄位的關係

  • 分群結果可能被以下參數影響:
    • 計算距離的方法
    • 分群依據
    • 更改數值的大小
  • 可能會遇到的問題:
    • 有時會不太清楚要如何切割分群結果

K-means clustering

  • 執行步驟
    • 指定要分幾群
    • 計算每一群的中心點
    • 將各個物件/觀察值指定給最近的中心點
    • 依照新的分群計算新的中心點

K-means clustering

  • 輸入
    • 計算距離的資料(數值)
    • 要分成幾群 # of clusters
    • 預設個群的中間點位置
  • 產出
    • 計算出每’群‘的中心點
    • 指定每個觀察值所在的’群‘

K-means clustering - example

x<-scale(mtcars$hp[-1]);y<-scale(mtcars$mpg[-1])
plot(x,y,col="blue",pch=19,cex=2)
text(x+0.05,y+0.05,labels=labelCar)

K-means - starting centroids

K-means - assign to closest centroid

K-means - recalculate centroids

K-means - reassign values

K-means - update centroids

kmeans() -1

  • Important parameters: x, centers, iter.max, nstart
dataFrame <- data.frame(x,y)
kmeansObj <- kmeans(dataFrame,centers=3)
names(kmeansObj)
## [1] "cluster"      "centers"      "totss"        "withinss"    
## [5] "tot.withinss" "betweenss"    "size"         "iter"        
## [9] "ifault"
kmeansObj$cluster
##  [1] 3 3 3 2 3 2 3 3 3 3 2 2 2 2 2 2 1 1 1 3 2 2 2 2 1 1 1 2 2 2 3

kmeans() -2

par(mar=rep(0.2,4))
plot(x,y,col=kmeansObj$cluster,pch=19,cex=2)
points(kmeansObj$centers,col=1:3,pch=3,cex=3,lwd=3)

Heatmaps

set.seed(1234)
dataMatrix <- as.matrix(dataFrame)[sample(1:12),]
kmeansObj <- kmeans(dataMatrix,centers=3)
par(mfrow=c(1,2), mar = c(2, 4, 0.1, 0.1))
image(t(dataMatrix)[,nrow(dataMatrix):1],yaxt="n")
image(t(dataMatrix)[,order(kmeansObj$cluster)],yaxt="n")

K-means注意事項

  • 需要決定# of clusters
  • K-means 沒有一定的結果
    • 不同的 # of clusters
    • 不同的 # of iterations

Use sum of squared error (SSE) scree plot to optimize the number of clusters

SSE: The sum of the squared distance between each member of a cluster and its cluster centroid.

參考資料

SSE scree plot

par(mfrow=c(1,1), mar = c(4,4,1,1)) #下,左,上,右
wss <- (nrow(dataMatrix)-1)*sum(apply(dataMatrix,2,var))
for (i in 2:(nrow(dataMatrix)-1)) {
    wss[i] <- sum(kmeans(dataMatrix,centers=i)$withinss)
}
plot(1:(nrow(dataMatrix)-1), wss, type="b", xlab="Number of Clusters",
     ylab="Within groups sum of squares")

Missing values

dataMatrix2 <- mtcars.mx
## Randomly insert some missing data
dataMatrix2[sample(1:100,size=10,replace=FALSE)] <- NA
head(dataMatrix2,10)
##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7  NA 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

Imputing {impute}

用knn的方法計算空值可能可以帶入的數值

#source("https://bioconductor.org/biocLite.R")
#biocLite("impute")
library(impute)
dataMatrix2 <- impute.knn(dataMatrix2)$data
head(dataMatrix2,10)
##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4